Water Resources in Copenhagen during 20th century

This script visualizes the spatial component of the data accompanying the Spring 2021 course on the City: Between Culture and Nature, taught by Mikkel Thelle and Mikkel Høghøj. The course surveys the gradual appearance of private and public bathing facilities, toilets and communal hygienic resources in the city of Copenhagen during the 20th century. By editing elements in this script, you can explore different aspects of past and present hygienic amenities in the capital of Denmark.

Before we start: data wrangling

First load the packages necessary for spatial data visualisation and analysis.

library(sf)
library(tidyverse)
library(spatstat)
library(spatialkernel)
library(googlesheets4)
library(leaflet)

Next, load your spatial data - polygons representing the suburbs of Copenhagen.

suburbs <- st_read("data/bydel.shp")
Reading layer `bydel' from data source `C:\Users\Adela\Documents\RStudio\1_Teaching\BetweenCultureAndNature\data\bydel.shp' using driver `ESRI Shapefile'
Simple feature collection with 10 features and 4 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
plot(suburbs$geometry)

tail(suburbs)
Simple feature collection with 6 features and 4 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 12.46344 ymin: 55.63174 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
   id bydel_nr                      navn areal_m2
5   6        6                Vanl<f8>se  6699011
6   5        5                     Valby  9234177
7   4        4 Vesterbro-Kongens Enghave  8472602
8   1        1                  Indre By 10488466
9   9        9             Amager <d8>st  9820410
10  2        2               <d8>sterbro  9858740
                         geometry
5  MULTIPOLYGON (((12.4982 55....
6  MULTIPOLYGON (((12.52434 55...
7  MULTIPOLYGON (((12.54553 55...
8  MULTIPOLYGON (((12.72897 55...
9  MULTIPOLYGON (((12.63082 55...
10 MULTIPOLYGON (((12.59777 55...
# write_rds(suburbs, 'data/CPHsuburbs.rds')

suburbs$id
 [1]  3  7  8 10  6  5  4  1  9  2
# Clean up suburb names
suburbs$navn
 [1] "N<f8>rrebro"               "Br<f8>nsh<f8>j-Husum"     
 [3] "Bispebjerg"                "Amager Vest"              
 [5] "Vanl<f8>se"                "Valby"                    
 [7] "Vesterbro-Kongens Enghave" "Indre By"                 
 [9] "Amager <d8>st"             "<d8>sterbro"              
# suburbs %>% select(navn) %>% mutate(Name=gsub('<f8>|<d8>','oe',navn))

Next attach the attribute data. I read the data from a google sheet where my colleagues and I can edit it. You can load it from there if you have a googlesheets4 package installed, or use the wc.csv in the data folder

# Uncomment the lines below to read data from GDrive wc <-
# read_sheet('https://docs.google.com/spreadsheets/d/1iFvycp6M6bF8GBkGjA2Yde2yCIhiy5_slAkGF-RUF7w/edit#gid=0',
# col_types = 'cnnnnnnnn') write_csv(wc, 'data/wc.csv')
wc <- read_csv("data/wc.csv")
wc
# A tibble: 100 x 9
   Suburb suburb_id flats wc_access  bath  year bath_communal_ct wc_communal_ct
   <chr>      <dbl> <dbl>     <dbl> <dbl> <dbl>            <dbl>          <dbl>
 1 Indre~         1 16280     11310  3800  1950               NA             NA
 2 Chris~         1  5490      3900   900  1950               NA             NA
 3 Voldk~         1 13460     12690  4560  1950               NA             NA
 4 Øster~         2 30820     28900 13750  1950               NA             NA
 5 Indre~         3 28700     24380  5910  1950               NA             NA
 6 Ydre ~         3 21710     20410  5800  1950               NA             NA
 7 Veste~         4 25850     23930  3730  1950               NA             NA
 8 Konge~         4  6270      6240  4240  1950               NA             NA
 9 Valby          5 14430     13970  8190  1950               NA             NA
10 Viger~         5  7700      7580  5050  1950               NA             NA
# ... with 90 more rows, and 1 more variable: hot_water <dbl>

Data on access to hygienic facilities and other water resources in Copenhagen now looks good and tidy, but its spatial resolution is better than the provided polygons (as in we have multiple rows that all fit within one suburb id). We therefore need to aggregate the data before we attach it to the spatial polygons

wcdata <- wc %>% group_by(year, suburb_id) %>% summarize(flats = sum(flats), bath = sum(bath), 
    pct_bath = bath/flats * 100, wc_access = sum(wc_access), pct_wc = wc_access/flats * 
        100, warmH20 = sum(hot_water), pct_wH20 = warmH20/flats * 100, communal_wc = sum(wc_communal_ct), 
    communal_bath = sum(bath_communal_ct))
wcdata
# A tibble: 50 x 11
# Groups:   year [5]
    year suburb_id flats  bath pct_bath wc_access pct_wc warmH20 pct_wH20
   <dbl>     <dbl> <dbl> <dbl>    <dbl>     <dbl>  <dbl>   <dbl>    <dbl>
 1  1950         1 35230  9260     26.3     27900   79.2    8530     24.2
 2  1950         2 30820 13750     44.6     28900   93.8   10750     34.9
 3  1950         3 50410 11710     23.2     44790   88.9    8810     17.5
 4  1950         4 32120  7970     24.8     30170   93.9    6110     19.0
 5  1950         5 22130 13240     59.8     21550   97.4   10830     48.9
 6  1950         6 10260  6780     66.1     10120   98.6    6270     61.1
 7  1950         7 27260 14790     54.3     26770   98.2   13280     48.7
 8  1950         8 19270 15000     77.8     18980   98.5   14690     76.2
 9  1950         9 23960 12470     52.0     22950   95.8   11210     46.8
10  1950        10 18000  9030     50.2     16200   90      7800     43.3
# ... with 40 more rows, and 2 more variables: communal_wc <dbl>,
#   communal_bath <dbl>
# write_rds(wcdata, 'data/CPH_wcdata.rds')

Join the data with the spatial representations

Now we can join the data with the spatial polygons

wc_spatial <- suburbs %>% merge(wcdata, by.x = "id", by.y = "suburb_id")
wc_spatial
Simple feature collection with 50 features and 14 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
First 10 features:
  id bydel_nr     navn areal_m2 year flats  bath pct_bath wc_access   pct_wc
1  1        1 Indre By 10488466 1981 26413 14035 53.13671     22546 85.35948
2  1        1 Indre By 10488466 1950 35230  9260 26.28442     27900 79.19387
3  1        1 Indre By 10488466 1965 32470 12780 39.35941     32450 99.93840
4  1        1 Indre By 10488466 1970 30440 11386 37.40473     22381 73.52497
5  1        1 Indre By 10488466 1955 33490  9740 29.08331     33430 99.82084
  warmH20 pct_wH20 communal_wc communal_bath                       geometry
1      NA       NA          NA            NA MULTIPOLYGON (((12.72897 55...
2    8530 24.21232          NA            NA MULTIPOLYGON (((12.72897 55...
3      NA       NA        9510          2280 MULTIPOLYGON (((12.72897 55...
4      NA       NA          NA            NA MULTIPOLYGON (((12.72897 55...
5    9130 27.26187          NA            NA MULTIPOLYGON (((12.72897 55...
 [ reached 'max' / getOption("max.print") -- omitted 5 rows ]
names(wc_spatial)
 [1] "id"            "bydel_nr"      "navn"          "areal_m2"     
 [5] "year"          "flats"         "bath"          "pct_bath"     
 [9] "wc_access"     "pct_wc"        "warmH20"       "pct_wH20"     
[13] "communal_wc"   "communal_bath" "geometry"     

Lets look at the data in a map.

Put the data on the map

Flats and water resources in 1950

wc1950 <- wc_spatial %>% filter(year == 1950)

library(tmap)
tmap_mode(mode = "plot")
tm_shape(wc1950) + # tm_facets(by = 'year')+
tm_borders(col = "black", lwd = 1) + tm_polygons("flats", style = "pretty") + tm_legend(legend.position = c("RIGHT", 
    "TOP")) + tm_compass(position = c("RIGHT", "BOTTOM"), type = "rose", size = 2) + 
    tm_scale_bar(position = c("RIGHT", "BOTTOM"), breaks = c(0, 2, 4), text.size = 1) + 
    tm_credits(position = c("RIGHT", "BOTTOM"), text = "Adela Sobotkova, 2021") + 
    tm_layout(main.title = "Copenhagen Flats", legend.outside = FALSE)

Flats through time

library(tmap)
tmap_options(limits = c(facets.view = 6))  # we want to view 6 years
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year") + tm_polygons("flats", style = "pretty") + 
    tm_layout(main.title = "Copenhagen Flats", legend.outside = TRUE)


## Lets look at flats per square kilometer

wc_spatial <- wc_spatial %>% mutate(area_km2 = areal_m2/1e+06, flat_per_km = flats/area_km2)
library(tmap)
tmap_options(limits = c(facets.view = 6))
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year") + tm_polygons("flat_per_km", n = 5, 
    style = "quantile")  #+
# tm_layout(main.title = 'Copenhagen Flats per sq km', legend.outside = TRUE)


## Access to toilets and baths, per suburb and sq kilometer

Lets calculate the baths and toilets available per square kilometer per each suburb

library(tmap)
tmap_options(limits = c(facets.view = 6))
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year") + # tm_borders(col = 'black', lwd = 1) +
tm_polygons("pct_bath", style = "pretty", title = "% of flats with \n access to bath")  #+
# tm_layout(main.title = 'Percentage of flats with access to a bath',
# legend.outside = TRUE)
library(tmap)
tmap_options(limits = c(facets.view = 6))
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year") + # tm_borders(col = 'black', lwd = 1) +
tm_polygons("pct_wc", style = "pretty", title = "% of flats with <br>access to WC") + 
    tm_layout(main.title = "Percentage of flats with access to WC", legend.outside = TRUE)



## You can further recalculate the number of baths per sq kilometer

wc_spatial <- wc_spatial %>% mutate(bath_per_km = bath/area_km2, wc_per_km = wc_access/area_km2)

..or continue with communal resources and warm water



# Access OSM data for Copenhagen and retrieve (whatever would be relevant?)

The OpenStreetMap contains free and open spatial data for physical features on the ground, with each features’ type being define using key:value pair tags. Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation.

Use:

  • osmdata:opq() to define the bounding box of the osm request
  • osmdata:add_osm_feature() to define the key:value pairs you are looking for
  • osmdata:osmdata_sf() to retrieve the osm data.
library(osmdata)

# Create a bounding box
bb <- suburbs %>% st_transform(4326) %>% st_bbox()
plot(bb)

q <- opq(bbox = bb, timeout = 180)
qa <- add_osm_feature(q, key = "amenity", value = "public_bath")
# qb <- add_osm_feature(q, key = 'amenity',value = 'drinking_water')
qc <- add_osm_feature(q, key = "amenity", value = "shower")
qd <- add_osm_feature(q, key = "amenity", value = "toilets")
# qe <- add_osm_feature(q, key = 'amenity',value = 'water_point')
public_bath <- c(osmdata_sf(qa), osmdata_sf(qc), osmdata_sf(qd))

Clean up OSM data

Use the following code to clean the results and project them in Danish UTM.

This code:

  • removes the duplicated geometries thanks to osmdata::unique_osmdata (see the documentation for details)
  • projects into WGC84 UTM32
  • keeps the name attribute only
  • computes the centroids for the baths stored as polygons
  • Eventually, the baths outside our CPH suburbs are removed.
library(osmdata)
bath_uniq <- unique_osmdata(public_bath)

rpoint <- bath_uniq$osm_points %>% filter(!is.na(amenity)) %>% st_transform(32632) %>% 
    dplyr::select(name)

rpoly <- bath_uniq$osm_polygons %>% st_transform(32632) %>% dplyr::select(name) %>% 
    st_centroid()

baths_osm <- rbind(rpoly, rpoint)

baths_osm <- st_intersection(baths_osm, st_transform(suburbs, 32632) %>% st_geometry() %>% 
    st_union())

# transform also historical baths
baths_cph <- wc_spatial %>% st_centroid() %>% st_transform(32632) %>% mutate(radius = sqrt(bath_per_km)) %>% 
    arrange(desc(bath_per_km))

Display two maps side-by-side

Now, let’s display the results in two synchronized mapview maps:

  • one with bathing resources in suburbs
  • another one with baths extracted from OSM.
  • Use the mapview::sync function to display both maps side by side with synchronisation.
library(mapview)
map_osm <- mapview(baths_osm, map.types = "OpenStreetMap", col.regions = "#940000", 
    label = as.character(suburbs$name), color = "white", legend = FALSE, layer.name = "Baths in OSM", 
    homebutton = FALSE, lwd = 0.5)


# test map
mapview(baths_cph[, -3], map.types = "Stamen.TonerLite", cex = "radius", legend = FALSE, 
    col.regions = "#217844", lwd = 0, alpha = 0.4)
map_cph <- mapview(baths_cph[, -3], map.types = "OpenStreetMap", col.regions = "#940000", 
    color = "white", cex = "bath_per_km", legend = TRUE, layer.name = "Baths per sq km <br>in suburbs from 1970", 
    homebutton = FALSE, lwd = 0.5)

sync(map_osm, map_cph)

Improve the display

The synced map functionality is nice, but the comparison does not make much sense: OSM public bathrooms versus private bathing facilities. It might be better to combine the OSM data with the public bathhouse data that we had looked at previously in Leaflet.

We need to

  • load the data from googlespreadsheet
  • filter out missing coordinates and convert to sf object
  • project to WGS84 UTM 32
# baths <- read_sheet("https://docs.google.com/spreadsheets/d/15i17dqdsRYv6tdboZIlxTmhdcaN-JtgySMXIXwb5WfE/edit#gid=0",
#                     col_types = "ccnnncnnnc")
# write_rds(baths,"data/baths.rds")
baths <- read_rds("data/baths.rds")
names(baths)
 [1] "BathhouseName"          "Coordinates"            "Longitude"             
 [4] "Latitude"               "Quality"                "AlternativeCoordinates"
 [7] "Long"                   "Lat"                    "ErrorRadius_m"         
[10] "Notes"                 
hist_bathhouses <- baths %>% 
  dplyr::select(BathhouseName,Longitude,Latitude,Quality) %>% 
  filter(!is.na(Longitude)) %>% 
  st_as_sf(coords=c("Longitude", "Latitude"), crs = 4236)

hist_baths <- st_transform(hist_bathhouses, crs=32632)

#test map
mapview(hist_baths, map.types = "Stamen.TonerLite",
        #cex="radius", legend=FALSE,
        col.regions="#217844", lwd=0, alpha=0.4)

Now, let’s load this projected historical bathouse object in the synced map so we can compare the locations with OSM data.

library(mapview)
map_osm <-  mapview(baths_osm, map.types = "OpenStreetMap", 
        col.regions = "#940000", 
        label = as.character(suburbs$name), 
        color = "white", legend = FALSE, layer.name = "Baths in OSM",
        homebutton = FALSE, lwd = 0.5) 

map_hist <-  mapview(hist_baths, 
          map.types = "OpenStreetMap", 
        col.regions = "#940000", 
        color = "white", 
       # cex = "bath_per_km",
        legend = TRUE, 
        layer.name = "Public bathhouses, early 20th century",
        homebutton = FALSE, lwd = 0.5) 

sync(map_osm,map_hist)

# Comparing two point patterns. How do we best do it?

We have two patterns, historical and OSM data. Are they similar or dissimilar? How do the patterns of historical and current public bathhouses compare beyond a quick eyeball evaluation?

Here we might be able to use some statistical functions that contrast nearest neighbor distances or multi-distance clustering across the two groups.

We should first check the nature of data: do both patterns represent completely mapped data rather than sampled data (where the nature of sampling can affect the comparison)? If the former, one could use nearest neighbor, K-function or Monte Carlo reassignment.

For a tutorial on Kcross function, see Manny G in https://gis.stackexchange.com/questions/4484/comparing-two-spatial-point-patterns#4490

Before we try some cross-functions, we need to wrangle

But first we need to recast the baths as ppp object. Note: st_union did not work as expected (it is multiplying the features), and so I did a workaround and combined the baths sf objects. En route I found nd this neat post on unioning using Danish municipalities https://gis.stackexchange.com/questions/278818/fastest-way-to-union-a-set-of-polygons-in-r

library(spatstat)

# Prepare the ppp object

# Rebuild ppp from scratch via a combined sf object
st_coordinates(hist_baths)  # 21 coordinates
          X       Y
1  723699.5 6174443
2  724739.1 6175577
3  722878.5 6175197
4  725530.6 6175670
5  725367.9 6180709
6  723141.1 6178092
7  722416.3 6174977
8  719854.3 6174236
9  723260.2 6177325
10 724289.1 6179385
11 721767.2 6176616
12 721704.2 6176980
13 724917.2 6184407
14 724886.9 6180620
15 725367.9 6180709
16 723455.4 6175714
17 723761.8 6176276
18 717811.3 6179209
19 719353.8 6179027
20 721196.5 6178563
21 725367.9 6180709
st_coordinates(baths_osm)  # 166 coordinates
           X       Y
1   725850.1 6177703
2   726493.8 6175590
3   726459.3 6175654
4   725095.5 6176950
5   725103.7 6176944
6   724953.2 6177691
7   725034.9 6174979
8   724863.2 6176972
9   725852.9 6175423
10  724030.8 6178529
11  720642.1 6178241
12  724272.1 6178188
13  724093.6 6176268
14  725328.6 6171078
15  725615.2 6175893
16  723943.1 6175166
17  722098.1 6174087
18  724776.9 6176260
19  730341.6 6181001
20  730320.1 6181110
21  730172.2 6181150
22  726178.4 6176088
23  724733.4 6180146
24  721246.1 6174057
25  719828.4 6178706
26  722955.9 6174727
27  723389.2 6176931
28  723352.3 6174642
29  724960.3 6174971
30  724989.1 6175004
31  724958.6 6174972
32  721255.6 6180747
33  728845.7 6174575
34  729144.4 6174067
35  729568.9 6173671
36  725908.7 6177944
37  724345.4 6178825
 [ reached getOption("max.print") -- omitted 129 rows ]
combined <- data.frame(rbind(st_coordinates(hist_baths), st_coordinates(baths_osm)))

# Now I am ssigning marks which need to be a factor
combined$name <- factor(c(rep("H", 21), rep("O", 166)))

# Create an sf object out of the dataframe
b_c <- st_as_sf(combined, coords = c("X", "Y"), crs = 32632)

# Convert into a marked ppp and confirm by plotting
b_ppp <- as.ppp(b_c)
plot(split(b_ppp))

## Nearest Neighbour Cross Function and Simulation We randomly reassign marks (H, O) within the combined point dataset and then calculate nearest neighbor between the randomly replaced marked points. Run the simulation 999 times

nn.sim <- vector()  #create container for simulation data
P.r <- b_ppp
for (i in 1:999) {
    marks(P.r) <- sample(b_ppp$marks)  # Reassign labels at random, point locations don't change
    nn.sim[i] <- mean(nncross(split(P.r)$O, split(P.r)$H)$dist)
}

Compare NN - simulation results visually

hist(nn.sim, breaks = 30)
abline(v = mean(nncross(split(b_ppp)$O, split(b_ppp)$H)$dist), col = "red")

### Compute empirical cumulative distribution

nn.sim.ecdf <- ecdf(nn.sim)

See how the original stat compares to the simulated distribution

nn.sim.ecdf(mean(nncross(split(b_ppp)$O, split(b_ppp)$H)$dist))
[1] 0.995996

Ripley-K cross function

Maybe we should look at the multi-scale approach to the bathhouses. Check out the Ripley K’cross-function blog and tutorial present at https://github.com/jlevente/publications/tree/master/cross-k

# Set intervals for moving window (you don't have to)
rc <- seq(0, 3000, 100)

# Run the Kcross function
kcross <- Kcross(b_ppp, i = "H", j = "O", r = rc, correction = "none")
plot(kcross)

How to explain this chart? It seems that the OSM baths cluster around historical baths, or are attracted to them even at distances. Or in other words, the ‘O’ events are closer to ‘H’ events that we would expect under complete spatial randomness.

How do we test for statistical significance? The question here is whether the H and O events are similarly clustered or not? Statistical difference can be tested with MC simulation with random labelling of points as O or H type (keeping original ratios) and computing the same cross K-function. The simulation mean and the established simulation envelopes tell us whether the observed between-type pattern is statistically significant or not.

kmult <- envelope(b_ppp, fun=Kcross,
                  nsim=100, i="H", j="O",
                  #r=rc, 
                  correction='none',
                  simulate=expression(rlabel(b_ppp)))  # are the two patterns similarly clustered or dispersed at different scales
Generating 100 simulations by evaluating expression  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.
plot(kmult, main="Cross-K function")

An observed curve within the confidence envelopes means that no matter how we group the points into categories, the pattern we identified in the previous step (by checking on the observed and theoretical values) doesn’t change when randomly assigning events into categories. Here the curve falls outside of the confidence envelopes, meaning that there are differences between the point categories.